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Abstract 

Random cost simulations were introduced as a method to investigate optimization prob- 
lems in systems with conflicting constraints. Here I study the approach in connection with 
the training of a feed-forward multilayer perceptron, as used in high energy physics ap- 
plications. It is suggested to use random cost simulations for generating a set of selected 
configurations. On each of those final minimization may then be performed by a standard 
algorithm. For the training example at hand many almost degenerate local minima are thus 
found. Some effort is spent to discuss whether they lead to equivalent classifications of the 
data. 
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1 Introduction 



Recently there has been some interest [1, 2, 3, 4, 5] in Monte Carlo (MC) sampling from 
Broad Energy Distributions (BED). The basic idea is about twenty years old and was first 
introduced under the name Umbrella Sampling [6]. The increased interest in related methods 
began with the success of Multicanonical Sampling [1] in the study of first order phase 
transitions. The name multicanonical em-phasizes the possibility of obtaining from one sample 
canonical expectation values over a temperature range. Soon a wide range of applications was 
realized. In particular it was stressed that the algorithmic ergodicity becomes enhanced by 
sampling with BED. This has lead to new perspectives concerning numerical investigations 
of systems with conflicting constraints, like for instance spin glasses [2, 4], proteins [7] or the 
traveling salesman problem [8]. 

A complication of these approaches is that they sample with weight factors w[E) which 
are a-priori unknown functions of the energy E. It is part of the algorithm's purpose to 
converge to a suitable approximation, which then allows to estimate the spectral density 
p{E). In practice complications emerge which are unknown for canonical MC simulations, 
where the correct weights are given by the Boltzmann factor wb{E) = exp^—jSE). 

The Random Cost (RC) method [9] samples a BED without the need of tedious recursions 
towards appropriate weight factors. This is achieved by employing simple master equations 
to enforce a random walk in a given cost function, for instance in the energy of a statistical 
mechanics system. The price paid is that one does not sample anymore with weights which 
depend only on the energy [i.e. the cost function). Consequently the ability to construct 
canonical expectations values is lost. This disadvantage is presumably of minor importance 
in applications to hard optimization problems, where one is mainly interested in an overview 
of the minima of the system and less in its statistical mechanics. RC may then compete with 
approaches like simulated annealing [10] or genetic algorithms [11]. 

In ref. [9] the RC method was illustrated for an artificially simple cost function. Since 
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then, no new experience was reported. One reason, as we shall see, is that implementing the 
method in more realistic situations is not entirely straightforward. There is a large amount 
of innovative freedom in setting up the random walk master equations. Realistic applications 
require to make some decision and wrong ones render the algorithm ineffective. 

In the present paper I focus on applying the basic ideas to the training of Neural Networks 
(NN). In high energy physics NN constitute powerful nonlinear extensions of conventional 
data analysis methods, see [12, 13, 14] and references therein. In the context of this paper 
the purpose of the NN is to illustrate (a) how the RC method works and (b) how it may lead 
to interesting new physical insight. The training of a feed-forward two-layer perceptron to 
search for top quark production in "all-jet" channels [16] is considered. The RC simulations 
yield a large number of local minima, which are well-separated in parameter space. This 
allows to address relevant questions like: 

(i) Is one global minimum dominating or are there many almost degenerate minima? 

In case of many almost degenerate minima: 

(ii) What is their distribution in parameter space? 

(iii) Do different minima lead to the same, or at least to similar, classifications of the data 
into events and background? 

The NN and the training data are described in the next section. In section 3 the RC 
method is outlined in some details. Section 4 is devoted to numerical results and their in- 
terpretation. On their basis, I conclude that RC is a promising method in the context of 
exploring NN minima. One may hope for considerable further improvements by exploiting 
the innovative freedoms of the method more efficiently. Summary and conclusions are given 
in section 5. 
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2 The Training Example 



We shall consider the training of a feed-forward two-layer perceptron for tt detection through 
6-quark tagging with soft muons [16]. The network function is defined by 



Yk = g 



1=1 \j=i 



where g{x) = — \ . (1) 

i + exp [—2,x) 



Here d^k, {k = l,...,A^d) are experimental data and cof^, 6^, u]^, 6] are the parameters of 
this network. In our example m = 5 and n = 4. Hence, there are Scu^^, 16^, 20cUjVj and 
hO]. This leaves us with 31 parameters which, generically, will now be denoted by x = (xj), 
(j = 1, 31). The aim of a training program is to minimize the mean square error 

1 JVd 

^2 = TT E (^^ - ^(^^ - k)f . (2) 
The function Yk itself is not binary, but has the useful property that (under certain con- 
ditions) it can be interpreted as a Bayesian a posteriori probability [15]. We shall use 
Nd = 5000 data d^k to train the network. For k = 1,...,2500 they are from the DO 50K 
sample [16, 17], and used to train the NN for background. This is achieved by choosing 
Nf, = 2500.5, i.e. 6{Nb — k) = 1. (The likelihood that a data point describes an event is less 
than 1/1000 for these data.) For k = 2501, ...,5000 the data are MC generated events from 
the ISA180_ALL.HBOOK sample. Each data point is a standard AllJets 4-tuple [16] 

dj = (C,APiv,iVJl/10,i7r3/500), (j = 1,2,3,4). 

The symbols stay for the following global event quantities: C = centrality, APL = aplanarity, 
NJl = average jet count, and HT3 = sum of jet Et excluding the first two jets. 



3 Random Cost Simulations 

We are interested in finding many (local) minima of a function 

f = f[x) where x = {xj) = {xi, x^) , 
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and in that process possibly its global minimum. In hard optimization problems (problems 
with conflicting constraints) it happens that one has to overcome barriers (local increases) 
of the function f{x) before convergence into globally interesting minima is achieved. The 
purpose of the RC method is to overcome such barriers through a stochastic process. In 
essence the method is described by the following four steps. 

(i) Generate randomly a set of update proposal for the argument: {Ax(^k^}. (Here we 

distinguish different function arguments by subscripts in parenthesis, like xi^k), whereas 
components of the argument are singled out through subscripts without parenthesis, 
like Xj.) 

(ii) Calculate the function changes Af^k) = /(^ + ^^{k)) ~ /(^) • 

(iii) Divide the update proposals into three subsets. First, {Ax^^} and {Axj"^,^} are defined 
such that Af^k) ^ '^/min and A/^"^,^ < — A/min holds for the corresponding function 
changes. Here A/min > is some (small) cut-ofF. All update proposals with |A/°^„j| < 
A/niin form the third set, {Ax^^h^}. 

(iv) When both, the {Ax^^} and the {Axj"^,^} set, are non-empty: Updates from these 
sets are chosen according to a probabilistic law which enforces a random walk in the 
function value /. (In case that one of these sets is empty, violations may be allowed.) 

In this paper the set of update proposal is defined as follows. Each component Xj is 
restricted to the same range \xj\ < Xmax- Allowed are updates in steps of Axij with 

AXjj = sign (z) 2"^"l'l Xmax , with Z = ±1, ±2, itZmax • 

The subscript i labels the stepsize and sign, whereas j picks a component of x. The updates 
are thus confined to a grid. The minimum grid length Ax^in is determined by the choice of 
imax- I like to emphasize that my choice of update proposals is neither unique nor claimed to 
be particularly efficient. The method allows for all kind of choices and presently it is unclear 
by which criteria efficient ones may be singled out. 
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In [9] the entire A/^j array was calculated for each RC update. For the present, more 
realistic, cost function the computational effort becomes then considerable. Fortunately, it 
turns out to be rather straightforward to invent modified updating procedures which are far 
less CPU time intensive. The simulations of the next section relies on the following one: 

Elements of the Axij array are picked at random [18] and the corresponding A/^j elements 
are calculated. As soon as A/^j > A/min and A/j/j/ < — A/min elements are found, the RC 
update is performed. Let us first assume that this happens before the entire array Axij is 
exhausted. Then either for A/^j > A/min or for Afi'y < — A/min there will be precisely 
one proposal. ^From the other set, one element is picked at random. As the elements were 
already picked at random, it is sufficient to to chose the last element. This means, we have 
two definite updating proposals 

Ax" corresponding to A/~ < — A/min 

and 

Ax^ corresponding to A/"*" > A/min . 
The RC equation is then simply 

p- A/- = p+ A/+ (3) 
This equation is easily solved for, say, 

A/+ 

^" = A/- + A/^- 

A random number Xj., uniformly distributed in the range < < 1, is then chosen. For 
Xr ^ P~ the Ax" update is accepted, otherwise the Ax^ update. 

When the entire set Ax^j leads only to updates with either A/^j > — A/min or A/^j < 
A/niinj we have found a local minimum or maximum. To be precise, we have found a 
local minimum or maximum within the precision imposed by the cut-off choice A/min. In 
its present implementation the simulation continues by accepting the last proposed Axij. 
The function values / will perform a random walk between thus defined local minima and 



maxima. If A/ is a typical stepsize and /max — /mm a typical distance between a local 
maximum and a local minimum, the simulation will need of the order |/max — /minP/|A/p 
steps to get from one side to the other. Here | A/| is bounded from below by A/mm- Related 
to this, a too small choice of A/mm renders the simulation inefficient. Instead of aiming 
at reaching local minima with high precision, it is here suggested to record the time series 
for a reasonable choice of A/mm- Many independent regions of configuration space are then 
reached. Independent minima of the time series are subsequently taken as starting points 
for one of the conventional [19] downward minimization algorithms. 

One may further restrict the RC simulations by imposing additional bounds. For in- 
stance one may reject all updates which lead to a function value larger than an imposed 
maximum /max > 0. Or one may reject all updates with A/ > A/max where, of course. 
A/max ^ A/min > 0. Some experience with such bounds is reported in the next section. As 
a general rule, I like to suggest that upper bounds on the function value should be imposed 
in a stochastic way by modifying the RC equation (3) in favor of one direction. 

4 Numerical Results 

Results from RC simulations of the NN error function (2) are now reported. Algorithmic 
performance and applications of physical relevance are treated in different subsections. 

4.1 Algorithmic performance 

For the parameters, discussed in the previous section, the following choices are made: 



A/min =10-^ A/, 



max 



0.1, 



and no upper bound /max- Further 



xA < X 



max 



2.5 and i 



max 



12|a;, I < X 



max 



10 and i 



max 



14 
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were tested. In each case Ax^i^ = 5/2^'* 0.00061. Distribution functions are defined by 



where p{E2) is the corresponding probabihty density. In practice estimators are obtained 
by simply sorting [19] the sampled values of E2. To plot distribution functions, instead of 
histograms of the probability density p[E2), has the advantage that one needs not to worry 
about an appropriate bin size. Figure 1 compares (for two Xj ranges) the distribution func- 
tions from RC simulations versus those from random sampling (RS). Here a RS configuration 
is defined by choosing for each parameter a random number, uniformly distributed in the 
allowed range. 

The reader should focus on the behavior of ^(£"2) for small E2 values. The RC distribu- 
tions show a sharp increase: For \xj\ < 2.5 about 20% of the configurations are generated 
in the range E2 < 0.2. For \xj\ < 10 this values is even up to more than 30% , implying 
that this is the preferable RC parameter choice. In contrast to RC simulations, RS gener- 
ates almost no configurations in the E2 < 0.2 range. It is amazing to note that for RS the 
parameter range \xj\ < 2.5 is preferable. For \xj\ < 10 most RS configurations exhibit E2 
values very close to 1/2. 

Concerning the RC results, it should be noticed that the distribution functions are not 
straight lines due to the fact that the magnitude of a typical change AE2 {E2 is the function 
/ of section 3) depends on E2. In the neighbourhood of local minima (in the sense of the 
algorithm) AE2 proposals become small and the algorithm spends more time there. Of 
course, configurations related by small AE2 changes are strongly correlated. 

To find out how many independent minima are generated, I depict in figure 2 the RC time 
series for the better parameter choice {\xj \ < 10). Altogether 100,000 RC updates (changes 
of a single parameter) are performed. For each 1000 updates the minimum and maximum 
values reached are plotted and, in order of their occurance in the time series, connected by 
straight lines to guide the eyes. Autocorrelation are clearly visible, but at the same time it 
becomes clear that a large number of independent minima (certainly > 20) are created. 
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Independent minima may be singled out by requiring that the time series went over some 
cut-ofF barrier E2 between subsequently recorded minima. From figure 2 as well as from 
the nature of the problem it is clear that E2 = 0.4 is a reasonably high choice. The lowest 
twenty minima left over then are depicted in figure 3 together with the lowest twenty minima 
obtained by creating 10^ RS configurations. On a DEC 3000 Alpha 600 workstation the CPU 
time needed for the 100,000 RC updates was 12.2 hours and the CPU time to create 10^ RS 
configurations was about 13 hours. It is obvious that RC easily outperforms RS also when 
autocorrelations are taken into account. 

Ideal efficiency of RC would be expected when energy barriers populate the region in- 
between the minima reached by RS and those reached by RC. This is due to the feature 
that RC climbs as enthusiastically uphill as downhill. It works by suppressing the statistical 
weight of configurations in-between extrema. In our example there are no strong indications 
of such barriers. The better performance of RC seems to be entirely due to the fact that it 
samples the rare configurations with low (and high) E2 far better than RS. In this sense the 
present case is too simple for RC. It remains to be explored whether NN with actual barriers 
between the RS region and the RC minima do exist. 

A (primitive) steepest gradient minimization program was applied to the RC as well as to 
the RS configurations whose E2 values are shown in figure 3. The purpose is to converge to 
the local minimum closest to the starting configuration. After this minimization the average 
E2 and best £"2,111111 values were 

£2 = 0.11831 ± 0.00025, £2,min = 0.11586 for RC 

and 

£2 = 0.11896 ± 0.00013, £2,min = 0.11788 for RS. 

The configurations thus found are called RC (or RS) minima in the following. Although the 
difference in the mean value E2 is not very dramatic, it is notable that the first seven RC 
minima are all lower than the best RS minimum. 
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Default settings of JETNET [12] return the value E2 = 0.11722 [17]. This would put it at 
position 5 in my set of RC minima. Running my minimization program on the configurations 
produced by JETNET reduces this value further to E2 = 0.11620. This is the second best 
of all my solutions and obtained far more CPU time efficient than the others. The point 
of RC is clearly not to save CPU time. Instead the purpose it to provide a simple method 
which allows to explore relatively hassle free relevant regions of the configurations space. 
Nowadays, it is normally a minor problem to find a fast workstation for a few days of MC 
simulations. To program a complicated approach could be the real stumbling block. The 
aim of a RC simulations is to gain increased confidence, that relevant regions of configuration 
space have not been overlooked. RS serves this purpose far less well, because the entropy 
of the interesting regions tends to be very small. If, in addition, energy barriers separate 
relevant minima from the high entropy region, RS with subsequent minimization may not 
get to them at all. Adding Gaussian noise to minimization certainly helps, but the entropy 
preference of such noise is the same as that of RS. 

RC greatly suppresses the high entropy regions while, at the same time, being able 
to climb up and down. Simulated annealing achieves the same purpose by varying the 
temperature. (The function E2 is then interpreted as the energy of a statistical mechanics 
system. It should be noted that RS corresponds to infinite temperature /3 = 0.) Here an 
advantage of RC seems to be that it needs less detailed considerations. Parameter choices 
like A/inin or Xmax are needed in both approaches. RC is then ready for a long run, as 
eqns. (3) automatically ensure a broad distribution (figure 1). In simulated annealing one 
has to worry about a scheme for lowering (and possibly rising again) the temperature. In 
many applications one may be unwilling to spend the work it takes to tune an annealing 
scheme. Such a scheme is necessary, because statistical mechanics distributions are narrow 
at any fixed temperature. 

If desired RC allows some tuning too. In particular, as we are interested in minima, one 
may like to restrict the sampling region by introducing an upper bound /max- This should 
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be done in a smooth way. Figure 4 compares the ^(£"2) distribution functions for a sharp 
versus a smooth upper bound /max = 0.3. The smooth bound is achieved by doubhng the 
p~ value of equation (4) for £"2 > 0.3. It is clear that the simulation with the sharp upper 
bound is the worse: It spends a large amount of CPU time on the immediate neighborhood 
of E2 = 0.3, because the updating stepsize A/ approaches there A/mm- The simulation with 
the smooth upper bound moves far more freely in the E2 = 0.3 neighborhood. Consequently, 
it spends less CPU time there and still reaches distant configurations faster. It should be 
noted that no major improvement over the simulation without upper bound was achieved. 
For the smooth bound the minima yield £2 = 0.11772 ± 0.00012 and E2,min = 0.11680. 

In difficult situations it may be worthwhile to try RC as one of various approaches. Each 
method, simulated annealing [10], multicanonical annealing [8], genetic algorithms [11] or 
RC has its own specific way to explore configurations space. Which method wins is most 
likely problem dependent. Presently there are no a-priori criteria at hand to choose one 
method over the others. Not spending too much of your own time may well favour RC. 

4.2 Physical applications 

The physical purpose of finding many minima is to increase confidence in classifications 
proposed by a NN. It is after all some kind of black box. At the first look differences 
between the twenty RC minima are rather small. To make the point, let us consider the RC 
minima with lowest and highest E2. In figure 5 distribution functions E{Yk), with Yk defined 
by equation (1), for the event and background training data of these solutions [E2 = 0.11586 
and E2 = 0.11984) are plotted. Events are the upper curves and background are the lower 
curves. 

It is seen that the smaller E2 value comes from the fact that this solution concentrates 
the background more efficiently into the 1^ — )• 1 limit. The other solution concentrates 
events more efficiently into the 1^ — )• limit. Apparently, the price paid is that also some 
of the background events get placed into this limit, as is more clearly seen from the inlay. 
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Altogether one tends to conclude that the classifications are almost equivalent, the main 
difference being that the entire curve is shifted with slight distortions of the shape. However, 
one has to make sure that there is not internal re-ordering, i.e. identical data classified far 
apart in different distributions of similar shape. To address this and other questions, it is 
convenient to introduce some norms. Let X = (Xi, with < < 1, we define: 



ll^lli = -A E(^0% ||X||2 = max{|X,|, z = l,...,n} and | |X| I3 = - ^ . (5) 

^ \ .=1 ^ .=1 

Relying on these norms various average distances were calculated and are reported in table 1. 

Let us first address the distances in parameter space. The parameters of the twenty RC 
minima are denoted by 

^min = •••,^3?) where s = l,...,n, 

and Us = 20 for the RC results. The second column of table 1 gives 

/||™(«) _ ™(«) ||\ — _ ^(«) II 

\IKmin •^selectll/" I Kmin "^selectlh 

* s=l 

the average distance of the minima x[^^^ away from the starting values a^sgjectJ which are 
selected from the RC simulation runs before local minimization is applied. When calculating 
the norms each x^*) component is first rescaled from the [—10,10] range into [—0.5,0.5], 
because a range of length one is used in the definition (5). The error bars in parenthesis 
apply to the last digits and correspond to our statistics of twenty solutions. Column two 
of table 1 should be compared with column four, where the (up to the given digits) exact 
average distance of 32 component independent random vectors x^*^ and x^^^ is written down. 
For these vectors each component is an uniformly distributed random number in the range 
[0,1). As expected, the found minima x[^^^ are fairly close to the parameters a^gglect selected 
from the RC simulation. 

Column three collects the average distances 

1 Tie Tie 



I mill mini I/' ^ ^ ^ ^ ll'^min '^mml 



i^s - 1) f=t 
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between different minima. There are 19 -20/2 = 190 different combination, to which the 
average values correspond. As only twenty are independent, the error bars are obtained as 
^(j/lQ from the estimated variances. It is seen that these distances are close to the distances 
between random vectors. A plot of all parameter values found is given in figure 6 and looks 
very similar to a plot were uniform random numbers in the [—10,10] range are drawn for 
each parameter. It goes beyond the scope of this paper to analyse for correlations. 

Let us turn to distances in function space. The vector components are then of the form 

(s) 

Yj^ . For column five and six s and t label again our twenty RC minima and k = 1, 5000 
labels the training data. In column seven results for a random vector with 5000 components 
are reported. 

For the average distances ||l^ort ~ ^ortll the K^^*^ and K^^*^ components are sorted in 
increasing order before the norms are calculated. These distances are characteristic for the 
differences seen in plots like figure 5 (but note that now all data are accommodated in one 
distribution function). For all three norms these distances turn out to be about 5% of the 
distances ||l^^an ~ ^Lnll which one encounters for random vectors of length 5000. 

The average distances — are the quantities of physical interest: They relate 

directly to differences in the classification of our training data. For norm one and norm 
three the numbers are about two times those of the corresponding sorted vectors, i.e. about 
10% of the random vector results. However, a few data points behave exceptional. The 
result for norm two means that the worst average re-ordering amounts to about 58% percent 
of the function values range (0 < 1^ < 1). As these are averages taken over the 190 
possible combinations of our solutions, the re-ordering of certain data with respect to two 
different network solutions is even worse. Indeed the largest re-ordering encountered is 
||y(i4) _ y(io)||2 = 0.91, whereas the best of the worst is - yW||2 = 0.21. For the 

solution y° found via JETNET and the best RC minimum we have \ \Y^^^ - y(°)||2 = 0.31. 
Finally for the solutions depicted in figure 5 the value is | jF^^o) _ | = o.57. These results 
show that the (slight) increase of Ei'^ with s for s = 1,...,20 seems to be irrelevant for the 
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reordering effect. Instead the internal structure of the solutions should be hold responsible. 

The small averages obtained with the other two distance definitions imply 

that large re-ordering happens only for a few data points. This is confirmed by plotting the 
distribution function of the I average in figure 7. For 98% of the data {\y!:'^ -Yjf^) 

is less than 0.1. It should be noted that the highest value for — Y^^\) is lower than 

iWk'^ - Ylf^h) of the table, because the k values for which the largest value is obtained 
depends on s and t. Of course, it is no problem to identify the individual data points which 
are subject to large re-ordering. It may be interesting, but is beyond the scope of this paper, 
to investigate whether they exhibit particular physical characteristics. 

To what extent can one now trust a classification proposed by the NN? The worst case 
scenario combines different solutions in the following way: 

WJ! = max{y^^*^|s = 1, for events 

and 

W'l^ = min{y^^*^|s = 1, n} for background . 

Here a cut off on the maximum allowed E2 value has to be set. A value of the order of 
a few percent seems to be reasonable. In the situation at hand, the AE2 difference be- 
tween solutions ^ 1 and ^ 20 is about 3.5% . Figure 8 shows what happens when solutions 
s = 1,...,20 are successively combined according to the worst case scenario. In the region 
0.1 < WJ} < 0.9 results apparently get stable. However, in the extreme limits [i.e. for a 
small amount of data) crossover effects between classification as event versus background are 
found. The results suggest that one should not apply this NN in these limits. 

5 Summary and Conclusions 

RC simulations sample ergodically through configuration space, while greatly enhancing 
(as compared to RS) the likelihood of configurations in the neighbourhood of minima (or 

13 



maxima). The updating scheme employed in this paper is considerably improved over the 
version of [9]. Further significant progress in this direction seems to be likely. 

A large number of practically independent local minima may be obtained by combining 
RC simulations with subsequent minimization. Many regions of configuration space are thus 
covered and barriers between them can be overcome. This increases the confidence that 
best solutions are not incidentally overlooked. In the present case many, almost degenerate, 
inequivalent minima are found. 

For physical applications the central question is whether degenerate minima lead to iden- 
tical classifications of the data. In the case at hand we find that this is to a limited extent the 
case. A small <2% fraction of the data exhibits fairly unpredictable re-ordering behavior. 
To be on the save side, one may combine several network solutions according to a worst case 
scenario. 

Acknowledgements: I would like to thank Harrison Prosper and Jeff McDonald for 
valuable discussions and for supplying me with the data used in this paper. The data, the 
final parameters and the RC Fortran program are available through e-mail to the author. 
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(1 1 -^select ~ -^minl 1) 


( 1 I'^min '^minl 1 ) 


( 1 1 -^ran ~ -^ran 1 1 ) 


( ^ort ~ ^ort ) 




{\\Y.^d-Ym) 




0.0253 (28) 


0.361 (11) 


0.4059 


0.0217 (17) 


0.0528 (22) 


0.4082 




0.0754 (75) 


0.794 (23) 


0.8427 


0.0457 (34) 


0.579 (37) 


0.9875 




0.0168 (22) 


0.295 (10) 


1/3 


0.0180 (15) 


0.0363 (16) 


1/3 



Table 1: Average distances between various vectors in parameter and function space. 
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Figure 1: RC and RS distribution functions F[E2). For E2 small only 
the RC curves exhibit the desired steep slope. 




Figure 2: Time series for a RC simulation. For each thousand subsequent 
data points the minimum and maximum E2 values are connected by 
straight lines. 




Figure 3: Independent minima reached by 100,000 RC updates in com- 
parison with those from 10^ independent RS configurations. 
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Figure 4: RC distributions functions F[E2) with a sharp versus a smooth 
upper bound /max imposed. 




Figure 5: Distributions functions F{Yk) for event and background training 
data corresponding to the RC minima ^ 1 and ^ 20. 
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Figure 6: Parameters Xj, j = 1, ...,31 for all twenty RC minima. 




Figure 8: Distributions functions F[WJ^) for event and background 
training data obtained by combining RC minima according to the worst 
case scenario. 



